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Abstract — In a recent paper, the authors proposed a new class 
of low-complexity iterative thresholding algorithms for recon- 
structing sparse signals from a small set of linear measurements 
fn . The new algorithms are broadly referred to as AMP, for 
approximate message passing. This is the second of two conference 
papers describing the derivation of these algorithms, connection 
with related literature, extensions of original framework, and 
new empirical evidence. 

This paper describes the state evolution formalism for analyz- 
ing these algorithms, and some of the conclusions that can be 
drawn from this formalism. We carried out extensive numerical 
simulations to confirm these predictions. We present here a few 
representative results. 

I. General AMP and State Evolution 
We consider the model 

y^Aso + Wo, So eR^, y,Wo , (1) 

with So a vector that is 'compressible' and Wo a noise vector 
We will assume that the entries of Wo are centered independent 
gaussian random variables with variance v. 

The general AMP (approximate message passing) algorithm 
reads 

- f]t{x' + A*z*), (2) 

2* - y-Ax'+-z'-^{4_,{x'-^+A*z'-^)),0) 
d 

with initial condition xq = 0. Here, for a vector u = 
{ui,...,un) we write (u) = J^iLi^i/N, and ??'(•;•) 
indicates the derivative of i] with respect to its first argument. 
Further 6 = n/N and {??*(• )}t>o is a sequence of scalar 
non-linearities (see Section III), a typical example being soft 
thresholding, which contracts its argument towards zero. 

A. Structure of the Algorithm 

This algorithm is interesting for its low complexity: its 
implementation is dominated at each step by the cost of 
applying A and A* to appropriate vectors. In some important 
settings, matrices A of interest can be applied to a vector 
implicitly by a pipeline of operators requiring log(iV) flops; 
an example would be A whose rows are randomly chosen from 
among the rows of a Fourier matrix; then Ax can be computed 
by EFT and subsampling. 

Even more, the algorithm is interesting for the message 
passing term |z*~^(77j_]^(a;*~^+A*z*~^)). Similar algorithms 



without this term are common in the literature of so-called iter- 
ative thresholding algorithms. As discussed in the companion 
paper, the message passing term approximates the combined 
effect on the reconstruction of the passing of nN messages in 
the the full message passing algorithm. 

The message passing term completely changes the statis- 
tical properties of the reconstruction, and it also makes the 
algorithm amenable to analysis by a technique we call State 
Evolution. Such analysis shows that the algorithm converges 
rapidly, much more rapidly than any known result for the 
1ST algorithm. Furthermore, it allows us to make a variety 
of theoretical predictions about performance characteristics of 
the algorithm which are much stronger than any predictions 
available for competing methods. 

B. State Evolution 

In the following we will assume that the columns of A are 
normalized to unit Euclidean length. We define the effective 
variance 

a{xtf^v+^^\\xt-s^\\l. (4) 

The effective variance combines the observational variance 
V with an additional term ||a;f — so||| that we call the 
interference term. Notice that v is merely the squared recon- 
struction error of the naive 'matched filter' for the case where 
So contains all zeros and a single nonzero in a given position 
i and the matched filter is just the i-th column of A. 

The interference term measures the additional error in 
estimating a single component of Sq ^ that is caused by the 
many small errors in other components j ^ i. The formula 
states that the effective variance at iteration t is caused by the 
observational noise (invariant across iteration) and the current 
errors at iteration t (changing from iteration to iteration). The 
interference concept is well known in digital communications, 
where phrases like mutual access interference are used for 
what is algebraically the same phenomenon. 

We will let at denote any estimate of at, and we will assume 
that at « at; see ||T| for more careful discussion. Suppose that 
the nonlinearity takes the form r]t{-) — ri{-]Ot) where is 
a tuning parameter, possibly depending on at\ see below for 
more. Let T denote the collection of CDFs on R and F be the 
CDF of so{i). Define the MSE map ^' : R+ x IR"^ x R+ 



by 

^{a;v,6,0t,F) = V + ^E{[^t(,X + a Z) - Xf} 

where X has distribution F and Z ^ JV{0, 1) is independent 
of X. We suppose that a rule 6(cr; v, S, 9, F) for the update 
of 9t is also known. 

Definition I.l. The state is 3. 5-tuple S = (fTj 9^ F^', state 
evolution is the evolution of the state by the rule 

ial,v,S,9t,F) ^ i^{a^);v,S,9t+i,F) 
t h-> t + l 

As the parameters {v, S, F) remain fixed during evolution, we 
usually omit mention of them and think of state evolution 
simply as the iterated application of and Q: 

9t ^ 9t+i = eiSt) 

t t + 1 

The initial state is taken to have CTq u + ||so||2/^<^- 

As described. State Evolution is a purely analytical con- 
struct, involving sequential application of rules and Q. The 
crucial point is to know whether this converges to a fixed point, 
and to exploit the properties of the fixed point. We expect that 
such properties are reflected in the properties of the algorithm. 
To make this precise, we need further notation. 

Definition 1.2. State-Conditional Expectation. Given a func- 
tion C : IR'* I— > IR, its expectation in state St is 

£{C\St) = E {C(C/, V, W, 7i{U + V + W))} , 

where U ^ F,V ^ iV(0, v) and W - A^(0, ct^ _ y). 

Different choices of C, allow to monitor the evolution of 
different metrics under the AMP algorithm. For instance, Q ~ 
{u — xY corresponds to the mean square error (MSE). The 
False Alarm Rate is tracked by C = l{r)(u+«;)#o} and the 
Detection Rate by C = l{r,(u+v+w)^o}- 

Definition 1.3. Large-System Limits. Let Q : H'^ ^ 'E. be 

a function of real 4-tuples (s,u,w,x). Suppose we run the 
iterative algorithm A for a sequence of problem sizes {n, N) 
at a the value {v,d,F) of underlying implicit parameters, 
getting outputs Xt, t = 1,2,3,... The large-system limit 
ls.lim(C, t. A) of C at iteration t is 

ls.lim(C,t,yl) = P-lim^_^^(C(so,i,Mt,i, Wo,i,a;t,j))Ar , 

where {■)n denotes the uniform average over i G 
{1, . . . , N} = [N], and p.lim denotes limit in probability. 

Hypothesis 1.4. Correctness of State Evolution for AMP. 

Run an AMP algorithm for t iterations with implicit state 
variables v,S, F. Run state evolution, obtaining the state St at 
time t. Then for any bounded continuous function C, ■ K,^ H 
of the real 4-tuples (s, u, x), and any number of iterations 
t, 



1) The large-system limit ls.lim((^,t,A) exists for the ob- 
servable Q at iteration t. 

2) This limit coincides with the expectation £(C,\St) com- 
puted at state St- 

State evolution, where correct, allows us to predict the 
performance of AMP algorithms and tune them for optimal 
performance. In particular, SE can help us to choose the non- 
linearities {ryt} and their tuning. The objective of the rest of 
this paper is twofold: (1) Provide evidence for state evolution; 
(2) Describe some guidelines towards the choice of the non- 
linearities {rjt}. 

II. AMP-Based algorithms 

Already in [IJ we showed that a variety of algorithms 
can be generated by varying the choice of rj. We begin with 
algorithms based on soft thresholding. Here rit{x) = rj{x]9t) 
is given by the soft threshold function 

( x-9 \i9 <x, 
r]{x; 9) = i if -9 < x < 9, (5) 

[ x + 9 if X <-9. 

This function shrinks its argument towards the origin. Several 
interesting AMP-Based algorithms are obtained by varying the 
choice of the sequence {^tjtgiN. 

A. AMP.M(<5) 

The paper |T| considered the noiseless case v = where 
the components of Sq are iid with common distribution F that 
places all but perhaps a fraction e = p{S) -6, p £ (0, 1) of its 
mass at zero. That paper proposed the choice 

9t = Ti6)at . (6) 

where an explicit formula for t{5) is derived in the online 
supplement [2 |. As explained in that supplement, this rule has 
a minimax interpretation, namely, to give the smallest MSE 
guaranteed across all distributions F with mass at zero larger 
than or equal to 1 — e. 

B. AMP.T(r) 

Instead of taking a worst case viewpoint, we can think of 
specifically tuning for the case at hand. Consider general rules 
of the form: 

9t^rdf (7) 

Such rules have a very convenient property for state evolution; 
namely, if we suppose that at = at, we can redefine the 
state as {af;v,6,T,F), with {v,6,t,F) invariant during the 
iteration, and then the evolution is effectively one-dimensional: 
at 1-^ <^t+i = '^{'^'t)- The dynamics are then very easy to 
study, just by looking for fixed points of a scalar function ^I*. 
(This advantage is also shared by AMP.M((5), of course). 

While the assumption at = at does not hold, strictly 
speaking, at any finite size, it will hold asymptotically in the 
large system limit for many good estimators of the effective 
variance. 



It turns out that, depending on F and S, different values 
of T lead to very different performance characteristics. It is 
natural to ask for the fixed value t = t*{v, S, F) which, under 
state evolution gives the smallest equilibrium MSE. We have 
developed software to compute such optimal tuning; results 
are discussed in ||5J. 

C. AMP.A(A) 

In much current work on compressed sensing, it is desired 
to solve the -penalized least squares problem 
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minimize — Axjjj + A||.t||i 



(8) 



In different fields this has been called Basis Pursuit denoising 
@ or Lasso |7 |. Large scale use of general convex solvers is 
impractical when A is of the type interesting from compressed 
sensing, but AMP-style iterations are practical. And, surpris- 
ingly an AMP-based algorithm can effectively compute the 
solution by letting the threshold 'float' to find the right level 
for solution of the above problem. The threshold recursion is: 

et+i = X+j{7/{x' + A*z';et)). (9) 

D. AMP.O 

It can also be of interest to solve the £i -minimization 
problem 

min llxjli subject to y — Ax. (10) 

X 

This has been called Basis Pursuit |6| in the signal processing 
literature. While formally it can be solved by linear program- 
ming, standard linear program codes are far too slow for many 
of the applications interesting to us. 

This is formally the A = case of AMP.A(A). In fact it can 
be advantageous to allow A to decay with the iteration number 



t+i = Xt + jW{x' + A*z';et)) 



(11) 



Here, we let At j as f ^ oo. 

E. Other N online arities 

The discussion above has focused entirely on soft thresh- 
olding, but both the AMP algorithm and SE formalism make 
perfect sense with many other nonlinearities. Some case of 
specific interest include 

• The Bayesian conditional mean: rj{x) — E{so|so + U + 
V — x}, where U and V are just as in Definition 1.2. This 
is indeed discussed in the companion paper Q, Section 
V. 

• Scalar nonlinearities associated to various nonconvex op- 
timization problems, such as minimizing £p pseudonorms 
for p < 1. 



III. Consequences of State Evolution 

A. Exponential Convergence of the Algorithm 

When State Evolution is correct for an AMP-type algorithm, 
we can be sure that the algorithm converges rapidly to its 
limiting value - exponentially fast. The basic point was 
shown in |T|. Suppose we are considering either AMP.M((5) 
or AMP.T(r). In either case, as explained above, the state 
evolution is effectively one-dimensional. Then the following 
is relevant. 

Definition III.l. Stable Fixed Point. The Highest Fixed Point 
of the continuous function is 

HFP(^') = sup{m : ^{m) > to}. 

The stability coefficient of the continuously differentiable func- 
tion is 

SC(^') '^-^{m) 

m=HFP(*) 

We say that HFP(^') is a stable fixed point ifO < SC(^') < 1. 

Let H2{F) = J x^dF denote the second-moment functional 
of the CDF F. 

Lemma III.2. Let ^(•) = ^{■■,v,S,F). Suppose that 
^i2{F) > HFP(^'). The sequence of iterates af defined by 
starting from CTq = H2{F) and (J^^i = ^'(Ci) converges: 

(t2^HFP(^'), t^oo. 

Suppose that the stability coefficient < SC(^') < 1. Then 

{aj ~ HFP(^')) < SC(^')' • {^l2{F) ~ HFP(^')). 

In short, when F and v are such that the highest fixed point 
is stable, state evolution converges exponentially fast to that 
fixed point. 

Other iterative thresholding algorithms have theoretical 
guarantees which are far weaker. For example, FISTA 1 8 1 has 
a theoretical guarantee of 0{\/t^), while SE evolution implies 
0(exp(-ci)). 

B. Phase Transitions For £i minimization 

Consider the special setting where the noise is absent Wo — 
and the object Sq obeys a strict sparsity condition; namely 
the distribution F places a fraction > 1 — e of its mass at the 
origin; and thus, if Sq is iid F, approximately • (1 — e) of 
its entries are exactly zero. 

A phase transition occurs in this setting when using £i 
minimization for reconstruction. Namely, if we write e = p - 5 
then there is a critical value p{S) such that, for e < p{S) ■ S, £i 
minimization correctly recovers So, while for e > p{5) ■ 6, l\ 
minimization fails to correctly recover So, with probability ap- 
proaching one in the large size Umit. State Evolution predicts 
this phenomenon, because, for e < p^-^iS) -8, the highest fixed 
point is at a1 — 0, while above this value, the highest fixed 
point is at al > 0. Previously, the exact critical value p{6) at 
which this transition occurs was computed by combinatorial 
geometry, with a rigorous proof; however, it was shown in HI 
that the algorithm AMP.M(5) has p{S) = psEiS), vaUdating 
the correctness of SE. 
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Fig. 1. Observables versus iteration, and predictions by state evolution. Panels 
(a)-(d): MSENZ, MSB, MDR, FAR. Curve in red: theoretical prediction. Curve 
in blue: mean observable. For this experiment, N = 5000, <5 = n/N = .3. 
F = 0.955(5o + 0.045<5i 

C. Operating Characteristics of ii penalized Least-squares. 

State evolution predicts the following relationships between 
AMP.T(t) and BPDN(A). AMP.T(r) has, according to SE, 
for its large-t limit an equilibrium state characterized by its 
equilibrium noise plus interference level <Joo{t). In that state 
AMP.T(r) uses an equilibrium threshold Ooo{t). Associated 
to this equilibrium NPI and Threshold, there is an equilibrium 
detection rate 

EqDR(T) = FMU +V + W;e^)^ 0} 

where U ^ F, V is N{0,v) and W is iV(0,cr^ - v), with 
U,V,W independent. Namely, for all sufficiently large r (i.e 

T > tq{S, F, v)) we have 

A-(l-EqDR(r)/,5)-0oo(T); 

this creates a one-one relationship A ^ t(A; v, S, F) cal- 
ibrating the two families of procedures. SE predicts that 
observables of the £i -penalized least squares estimator with 
penalty A will agree with the calculations of expectations for 
AMP.T{T{X;v,d,F)) made by state evolution. 

IV. Empirical Validation 

The above-mentioned consequences of State Evolution can 
be tested as follows. In each case, we can use SE to make 
a fixed prediction in advance of an experiment and then we 
can run a simulation experiment to test the accuracy of the 
prediction. 

A. SE Predictions of Dynamics of Observables 

Exponential convergence of AMP-based algorithms is 
equivalent to saying that a certain observable - Mean-squared 
error of reconstruction - decays exponentially in t. This is but 
one observable of the algorithm's output; and we have tested 
not only the SE predictions of MSE but also the SE predictions 
of many other quantities. 



In Figure [T| we present results from an experiment with 
signal length N — 5000, noise level t; = 0, indeterminacy 
5 = n/N = 0.30 and sparsity level e = 0.045. The distribution 
F places 95.5% of its mass at zero and 4.5% of its mass at 
1. the fit between predictions and observations is extremely 
good - so much so that it is hard to tell the two curves apart. 
For more details, see 121. 

B. Phase Transition Calculations 

Empirical observations of Phase transitions of £i minimiza- 
tion and other algorithms have been made in \A\, f9\, and we 
follow a similar procedure. Specifically, to observe a phase 
transition in the performance of a sparsity-seeking algorithm, 
we perform 200 reconstructions on randomly-generated prob- 
lem instances with the same underlying situation {v = 0, 5, 
F) and we record the fraction of successful reconstructions in 
that situation. We do this for each member of a large set of 
situations by varying the undersampling ratio 5 and varying 
sparsity of F. More specifically, we define a ((5, p) phase 
diagram [0, 1] and consider a grid of sites in this domain with 
5 = .05, .10, . . . and p = .03, .06, . . . , .99. For each 6, p pair 
in this grid, we generate random problem instances having a 
fc-sparse solution sq, i.e. a vector having k ones and n ~ k 
zeros; here k = p ■ S ■ N. 

Defining success as exact recovery of sq to within a small 
fixed error tolerance, we define the empirical phase transition 
as occurring at the p value where the success fraction drops 
below 50%. For more details, see lHJ. 

Figure 2 depicts the theoretical phase transition predicted 
by State Evolution as well as the empirical phase transition 
of AMP.M((5) and a traditional iterative soft thresholding 
algorithm. In this figure N = 1000, and AMP.M((5) was run 
for T — 1,000 iterations. One can see that empirical phase 
transition of AMP.M((5) matches closely the state evolution 
prediction. One can also see that the empirical phase transition 
of iterative soft thresholding, without the message passing 
term, is substantially worse than that for the AMP-based 
method with the message passing term. 

C. Operating Characteristics of li penalized Least-squares 

The phase transition study gives an example of SE's accu- 
racy in predicting AMP-based algorithms in a strictly sparse 
setting, i.e. where only a small fraction of entries in sq are 
nonzero. For a somewhat different example, we consider the 
generalized Gaussian family, i.e. distribution functions F^ with 
densities 

/„(x)=exp(-|.Tr)/Z„. 

In the case a = 1 there is a very natural connection with £i- 
minimization algorithms, which then become MAP estimation 
schemes. In the case a = 1, an iid realization from /„, 
properly rescaled to unit li norm, will be uniformly distributed 
on the surface of the ball, and in that sense this distribution 
samples all of the £i ball, unlike the highly sparse distributions 
used in the phase transition study, which sample only the low- 
dimensional faces. When a < 1, the sequence is in a sense 



Comparison of Different Aigoritfims 

1 , 1 1 1 1 1 1 1 MSE(EII1PenEII2) and SE pred.; GenGauss Vary delta & p 




5 10 15 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ID SE prediction 

6 



Fig. 2. Phase transitions of reconstruction algorithms. Blue Curve: 
Phase Transition predicted by SE; Red Curve empirical phase transition 
for AMPM{5) as observed in simulation; Green Curve, empirical phase 
transition for Iterative Soft Thresholding as observed in simulation. 

more sparse than when a = 1. The case a = .7 has been found 
useful in modelling wavelet coefficients of natural images. 

We considered exponents a £ {0.35,0.50,0.65,0.75,1.0}. 
At each such case we considered incompleteness ratios S E 
{0.1,0.2,0.3,0.4,0.5}. The set of resulting {a,S) pairs gives 
a collection of 25 experimental conditions. At each such 
experimental condition, we considered 5 or so different values 
of A for which SE-predicted MSB's were available. In total, 
simulations were run for 147 different combinations of a, S 
and A. At each such combination, we randomly generated 200 
problem instances using the problem specification, and then 
computed more than 50 observables of the solution. In this 
subsection, we used N = 500. 

To solve an instance of problem (|8]l we had numerous 
options. Rather than a general convex optimizer, we opted to 
use the LARS/LASSO algorithm. 

Figure 3 shows a scatterplot comparing MSB values for 
the LARS/LASSO solution of ^ with predictions by State 
Bvolution, as decribed in section III.C. Bach data point cor- 
responds to one experimental combination of a, 6, A, and the 
datapoint presents the median MSB across 200 simulations 
under that combination of circumstances. Bven though the 
observed MSB's vary by more than an order of magnitude, it 
will be seen that the SB predictions track them accurately. It 
should be recalled that the problem size here is only N = 500, 
and that only 200 replications were made at each experimental 
situation. In contrast, the SB prediction is designed to match 
large-system limit. In a longer paper, we will consider a much 
wider range of observables and demonstrate that, at larger 
problem sizes N, we get successively better fits between 
observables and their SB predictions. 
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Fig. 3. Mean-squared Error for ^i-penalized Least-squares estimate versus 
predicted error according to State Evolution. 
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